AITopics | convolution block

Collaborating Authors

convolution block

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Neural Information Processing SystemsApr-25-2026, 13:15:12 GMT

Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Country:

Asia > South Korea > Incheon > Incheon (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DF-Mamba: Deformable State Space Modeling for 3D Hand Pose Estimation in Interactions

Zhou, Yifan, Ohkawa, Takehiko, Zhou, Guwenxiao, Goto, Kanoko, Hirose, Takumi, Sekikawa, Yusuke, Inoue, Nakamasa

arXiv.org Artificial IntelligenceDec-3-2025

Modeling daily hand interactions often struggles with severe occlusions, such as when two hands overlap, which highlights the need for robust feature learning in 3D hand pose estimation (HPE). T o handle such occluded hand images, it is vital to effectively learn the relationship between local image features (e.g., for occluded joints) and global context (e.g., cues from inter-joints, inter-hands, or the scene). However, most current 3D HPE methods still rely on ResNet for feature extraction, and such CNN's inductive bias may not be optimal for 3D HPE due to its limited capability to model the global context. T o address this limitation, we propose an effective and efficient framework for visual feature extraction in 3D HPE using recent state space modeling (i.e., Mamba), dubbed Deformable Mamba (DF-Mamba). DF-Mamba is designed to capture global context cues beyond standard convolution through Mamba's selective state modeling and the proposed deformable state scanning. Specifically, for local features after convolution, our deformable scanning aggregates these features within an image while selectively preserving useful cues that represent the global context. This approach significantly improves the accuracy of structured 3D HPE, with comparable inference speed to ResNet-50. Our experiments involve extensive evaluations on five divergent datasets including single-hand and two-hand scenarios, hand-only and hand-object interactions, as well as RGB and depth-based estimation. DF-Mamba outperforms the latest image backbones, including VMamba and Spatial-Mamba, on all datasets and achieves state-of-the-art performance.

artificial intelligence, computer vision, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.02727

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

eec7fee9a8595ca964b9a11562767345-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 17:41:37 GMT

artificial intelligence, machine learning, secret image, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

e8507db80464ced5658d16b49bd458b9-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-19-2025, 15:27:12 GMT

Interestingly, for HMDB51, the Synthetic pre-train dataset has more overlapping classes, yet the Kinetics pre-trained model still outperforms on this downstream task.

artificial intelligence, downstream task, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Look More but Care Less in Video Recognition

Neural Information Processing SystemsAug-18-2025, 20:12:47 GMT

With this design, we can introduce more frames to the network but cost less computation.

artificial intelligence, machine learning, navigation module, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Cardiac MRI Semantic Segmentation for Ventricles and Myocardium using Deep Learning

Mukisa, Racheal, Bansal, Arvind K.

arXiv.org Artificial IntelligenceApr-21-2025

Automated noninvasive cardiac diagnosis plays a critical role in the early detection of cardiac disorders and cost - effective clinical management. Automated diagnosis involves the automated segmentation and analysis of cardiac images. Precise delineation of cardiac substructures and extraction of their morphological attributes are essential for evaluating the cardiac function, and diagnosing cardiovascular disease such as cardiomyopathy, valvular diseases, abnormalities related to septum perforations, and blood - flow rate . Semantic segmentation labels the CMR image at the pixel - level, and localizes its subcomponents to facilitate the detection of abnormalities, including abnormalities in cardiac wall motion in an aging heart with muscle abnormalities, vascular abnormalities, and valvular abnormalities. In this paper, we describe a model to improve semantic segmentation of CMR images. The model extracts edge - attributes and context information during down - sampling of the U - Net and infuses this information during up - sampling to localize three major cardiac structures: left ventricle cavity (LV); right ventricle cavity (RV); and LV myocardium (LMyo) . We present an algorithm and performance results. A comparison of our model with previous leading models, using similarity - metrics between actual image and segmented image, shows that our approach improves Dice s imilarity c oefficient (DSC) by 2% - 11% and lowers Hausdorff distance (HD) by 1.6 - 5.7 mm .

artificial intelligence, machine learning, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2504.13391

Country:

Europe (1.00)
North America > United States > California > Los Angeles County (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech

Kim, Ji-Hoon, Choi, Jeongsoo, Kim, Jaehun, Jung, Chaeyoung, Chung, Joon Son

arXiv.org Artificial IntelligenceMar-21-2025

The objective of this study is to generate high-quality speech from silent talking face videos, a task also known as video-to-speech synthesis. A significant challenge in video-to-speech synthesis lies in the substantial modality gap between silent video and multi-faceted speech. In this paper, we propose a novel video-to-speech system that effectively bridges this modality gap, significantly enhancing the quality of synthesized speech. This is achieved by learning of hierarchical representations from video to speech. Specifically, we gradually transform silent video into acoustic feature spaces through three sequential stages -- content, timbre, and prosody modeling. In each stage, we align visual factors -- lip movements, face identity, and facial expressions -- with corresponding acoustic counterparts to ensure the seamless transformation. Additionally, to generate realistic and coherent speech from the visual representations, we employ a flow matching model that estimates direct trajectories from a simple prior distribution to the target speech distribution. Extensive experiments demonstrate that our method achieves exceptional generation quality comparable to real utterances, outperforming existing methods by a significant margin.

artificial intelligence, machine learning, proc, (17 more...)

arXiv.org Artificial Intelligence

2503.16956

Country: